A scalable empathic-mindset intervention reduces group disparities in school suspensions

Suspensions remove students from the learning environment at high rates throughout the United States. Policy and theory highlight social groups that face disproportionately high suspension rates—racial-minoritized students, students with a prior suspension, and students with disabilities. We used an active placebo-controlled, longitudinal field experiment (Nteachers = 66, Nstudents = 5822) to test a scalable “empathic-mindset” intervention, a 45- to 70-min online exercise to refocus middle school teachers on understanding and valuing the perspectives of students and on sustaining positive relationships even when students misbehave. In preregistered analyses, this exercise reduced suspension rates especially for Black and Hispanic students, cutting the racial disparity over the school year from 10.6 to 5.9 percentage points, a 45% reduction. Significant reductions were also observed for other groups of concern. Moreover, reductions persisted through the next year when students interacted with different teachers, suggesting that empathic treatment with even one teacher in a critical period can improve students’ trajectories through school.

Non-assigned teachers differed in several ways from assigned teachers (see Table S8). With the exception of the main effect of condition, all analyses focus on students of the 66 teachers who were randomized to condition in Fall 2017 (ITT analyses). There were no exclusions of teachers based on engagement with or completion of the materials.
The school district provided discipline records, course records, and demographic records for all students in all district middle schools for the year before (AY 2016(AY -2017, during (AY 2017(AY -2018, and after (AY 2018-2019) the intervention, including the 13,210 seventh and eighth grade students these teachers taught. However, there were occasional missing data.

Analytical Approach
All analyses were conducted in STATA Version 16.1 using multilevel linear probability random effects models (implemented by the mixed regression command) in which students were nested within math course title-teacher groups (e.g.., classrooms) within teachers within schools.
The goal was to understand the effect of treatment on students' year-long probability of suspension as a function of whether teachers were exposed to either treatment or control materials in semester 1, the Fall 2017 semester. A student was considered to be in the treatment condition if the student had at least one math course in Fall 2017 in which the math teacher was randomly assigned to the treatment that semester. A student was considered to be in the control condition if the student had no math courses in Fall 2017 in which the teacher was randomly assigned to the treatment in Fall 2017 and at least one math course in Fall 2017 in which the math teacher was randomly assigned to the control in Fall 2017.
In the ITT sample (N=5,822), the goal was to compare only students in randomized treatment versus control conditions. To achieve this all students without a fall-semester math course taught by a math teacher randomized to condition in the fall were excluded from analyses. In the larger sample (N=13,210; hereafter the ALL sample), where the goal was to increase power for the focal comparison and compare grade 7 and 8 students in both randomly assigned conditions to students of all other math teachers at the participating schools, these students were included as a "non-assigned" condition. Non-assigned students could include 1) students whose Fall-2017 math courses had no math teachers assigned to condition in Fall 2017 (because these teachers never started the study materials, because these teachers never reached the point of random assignment, or because they only started the materials in Spring 2018) students who had no Fall-2017 math courses (only Spring-2018 math courses) or 3) students whose first exposure to an assigned math teacher(s) was in the spring semester (only their spring semester math courses were taught by assigned math teachers). For the main effect of condition only, the manuscript text reports results obtained in both the ITT (N=5,822) and the larger sample including non-assigned students (N=13,210). All other results reported reflect the ITT sample.
The treatment condition served as the reference category in typical models (except for specific purposes, such as to compute the main effect of race in the control condition). The effect of condition was represented by a single contrast in treatment-versus-control-only models (1=Control; 0=Treatment) and by two contrasts in treatment versus control and non-assigned models (1=Control, 0=Otherwise; 1=Non-Assigned, 0=Otherwise; hence the treatment condition has a code of 0 for both indicators). All models controlled for student-level dichotomous raceethnicity, gender, and suspension status in the previous school year (0=No, student received no days on suspension in the 2016-2017 school-year, 1=Yes, student received at least one day on suspension in the 2016-2017 school-year). Additionally, we controlled for continuous average suspension rates in the previous school year at the course-level (district wide, averaging across schools that offered the math course and all district teachers who taught the math course) and teacher-level (averaging across all students taught by the teacher, in any math course) (see below). When students had multiple math courses (because they had multiple courses with the same teacher or multiple math teachers), we computed an average previous suspension rate across the student's courses for the course-level and teacher-level previous suspension rates, respectively. We used three separate missing value indicators to indicate whether students were missing data for previous suspensions (0 = No, 1 = Yes), one for each of the three previous suspension covariates (student-level, course-level, and teacher-level). We did not have missing data for any other covariates.
In all models (both excluding and including non-assigned students), we mean-centered course-level and teacher-level prior suspensions on the course-level mean and teacher mean, respectively, for math courses and teachers in the ITT sample so the intercept would represent the treatment-condition mean for a student in a course with average levels of previous suspensions relative to all ITT-sample math courses and a teacher with average levels of previous suspensions relative to all ITT-sample math teachers. Consistent with prior research (35), we mean-centered student-level prior suspensions on the mean for a student's race-ethnicity × gender group, so the intercept in the model would represent the treatment-condition mean for the average student in a given race-ethnicity × gender group, given that suspension rates differ across such groups. When they functioned primarily as a covariate in the model, as in analyses assessing the main effect of condition, the remaining dichotomous indicators (student raceethnicity, gender, and missing value indicators) were mean-centered on the student-level mean for that variable. When the goal was to test an interaction between a specific dichotomous variable or variables (e.g., student race-ethnicity, gender, or previous suspension status) and condition, the relevant dichotomous indicator was left as a 0/1 variable and allowed to interact with each condition contrast (while all other dichotomous indicators were student-meancentered). The analysis assessing the interaction between condition and student special education status (0=No disability; Yes=At least one disability) added a 0/1 predictor for special education status and its interaction with condition to a model in which all other student-level predictors (race-ethnicity, gender, and previous suspension status) were student-mean-centered. We computed simple effects of condition using the margins command. These reflect the control versus treatment difference (or non-assignment versus treatment difference) in the probability of suspension when the student group of interest was defined as 0 for the moderator and had average values for all other covariates. Thus positive coefficient values reflect lower treatment means relative to control or non-assigned means and correspond to treatment reductions.

Measures and coding methods
Dependent variable. Consistent with previous research (23), the primary dependent variable was whether a student received at least one suspension day (=1) versus not (=0) as a result of a discipline referral from any teacher or staff member at any time during the 2017-2018 school year, the year of intervention. For students who had school records at district schools in both 2017-2018 and 2018-2019, we also assessed whether a student received at least one suspension day (=1) versus not (=0) during the school year immediately following the intervention year (2018-2019).
Student-level condition. To accommodate the few students who had multiple assigned math teachers, we defined a student as in the treatment condition if they had at least one Fall-2017 math teacher who was randomly assigned to the empathic mindset intervention in Fall 2017. We defined a student as in the control condition if they had at least one Fall-2017 math teacher who was randomly assigned to the control condition in Fall 2017 but no Fall-2017 teachers assigned to the treatment condition that semester. The ALL sample included all other students in the "no-assignment category": these students were present in grade 7 or 8 math courses but did not have a Fall-2017 math teacher who received a random assignment to condition in Fall 2017.
Previous suspensions at the student-, course-, and teacher-level covariates. To control for previous student-level suspensions, we used a single contrast to distinguish students who had received no suspension days in the prior academic year (=0) from students who had received one or more suspension days (=1). Because this district offers a common math curriculum across schools and students in specific types of math classes may have different propensity for receiving suspensions, independent of teacher, we also controlled for the average number of suspensions associated with each math course title across schools in the prior school year. We further controlled for the average suspension rate of students in the prior school year who had the student's same math teacher during the previous school year.
Student demographic covariates. Students belonged to one of six racial-ethnic groups: White, Asian, Black, Hispanic, American Indian, Mixed (two or more races). These racial-ethnic groups differed in both proportional representation and average previous suspension rates in our sample in ways that are consistent with national data. When models simply sought to control for race-ethnicity (e.g., as when testing the main effect of condition), we used two contrasts: Black and Hispanic (=1) versus White (=0) and all other groups (=1) versus White (=0). Here Other indicates Asian, American Indian, or Students with Two or More Races. This approach most closely reflected the proportional representation of racial-ethnic groups in our sample, while also preserving some distinctions between groups in ways that partially reflected prior group discipline patterns. In the 13,210 sample, both Black and Hispanic students had similar proportional representation (16.8% and 17.6% respectively) while all other minority groups each had ≤ 5.3% representation. This was also true in the 5,822 ITT sample (16.5% Black, 14.7% Hispanic, all other minority groups ≤ 6.2% representation respectively). This coding scheme grouped together the two largest minoritized groups in our sample that tend to face disadvantage in disciplinary contexts (Black and Hispanic). Black and Hispanic students also had significantly higher suspensions the year prior to the intervention than White students (See Figure S3). The reference category in regression models with this coding scheme was White students, which was the most numerous racial-ethnic group in the sample (55.7% of the ALL sample and 57.8% of the ITT sample; See Figure S2). Since this model best reflects the nature of this particular sample, we use this racial coding as a mean-centered covariate for race-ethnicity. Gender was coded as female (=1) versus male (=0) and was also student-mean-centered.
Effects by student race-ethnicity and gender. To increase power for tests of interactions between condition and student race-ethnicity, we converted these two dichotomous raceethnicity variables into a single race-ethnicity contrast that distinguished Black or Hispanic (=1) from not Black or Hispanic students (=0). This coding scheme was more parsimonious and required only one interaction term with condition yet still grouped the two largest minoritized groups in our sample that tend to face disadvantage in disciplinary contexts (Black and Hispanic students). We also report results based on other coding schemes (including the pre-registered Black, Hispanic, American Indian, and Two or More Races vs White or Asian) in supplemental analyses (Tables S14-15).
Effects by student suspension history. We first tested whether condition effects varied as a function of receiving one or more 2016-2017 discipline referrals that resulted in suspension day(s) (=1) versus no (=0) 2016-2017 discipline referrals that resulted in suspension days. This required a single predictor (1=One or More Prior Suspensions; 0=No Prior Suspensions) interacted with condition. Second, we disaggregated the former category into receiving exactly one discipline referral that resulted in a suspension day(s) versus receiving two or more discipline referrals that resulted in a suspension day(s). This created three categories: students with no, one, and two or more prior suspensions. We tested these as two contrasts, with no suspensions serving as the reference category and each of the other two coded as 1 (One Prior Suspension = 1, Otherwise = 0; Two or More Prior Suspensions = 1, Otherwise = 0).
Effects by student with special education status. We used a dichotomous variable for whether students had any disability based on special education status as listed in school records (0=No -Student had no disabilities; 1=Yes -Student had at least one disability). These disabilities included one or more of 17 disability statuses defined by the district, including physical, academic, social, or emotional behavioral disabilities.

Additional Analyses Tables: Subsequent year missing data (not pre-registered)
It is possible that students who were suspended during the intervention year were more likely to leave the school district. Might this have impacted subsequent year outcomes in this research? To answer this question, we focused on students who were 7 th graders in the intervention year, as nearly all students included in the student-level subsequent year analysis were 7 th graders (rather than 8 th graders) in the intervention year. In addition, leaving the district is more ambiguous for 8 th graders, as it could represent the positive outcome of being promoted to high school/grade 9. The below models show suspensions during the intervention year did predict log-odds of leaving the school district (Table S35), b=0.78 (log-odds), p<.001. In percentage-point terms, students who were suspended in the intervention year had a 5.8 percentage point higher probability of leaving the district in the subsequent year (11.2%) relative to non-suspended students (5.5%). Exploring this further by condition, this phenomenon was primarily an issue for the control condition. There was no significant difference in probability of leaving the district for suspended (M=7.6%) or non-suspended students (M=5.2%) in the treatment condition, b=0.024 (percentage points), p=.28. Yet control-condition students who were suspended (M=13.0%) were significantly more likely than control-condition students who were not suspended (M=5.7%) to leave the district the next school year, b=0.073 (percentage points), p<.001. (Table S36-S37). However, these simple effects should be interpreted with caution, as the intervention-year suspension status × condition interaction did not reach significance, b=0.51 (log-odds), p=.19.
Importantly, this selection process works against the subsequent-year treatment effect. If students who, in 7 th grade, would otherwise have been at high risk of receiving a suspension the next year but did not because they left the district, in part as a consequence of having a 7 th -grade math teacher who was assigned to the control condition, this would depress the control-condition suspension mean in the subsequent year relative to what it would have been if all students had been retained in the district into 8 th grade. Thus, the treatment vs control effects we observe in the subsequent year may underestimate the true effect. Tables   Table S1. Proportions representing the relationships among the three moderators for probability of suspension: students' race-ethnicity, prior-year suspension status, and specialeducation status. 100.0% Note. The table presents the proportion of students with teachers assigned to condition in the ITT sample (N=5,822) in each racial-ethnic group, as well as the proportion of each racial-ethnic group who received at least one suspension in the pre-intervention school year (AY 2016-2017) and the proportion of each racial-ethnic group with any special education status in the intervention school-year (AY 2017(AY -2018. In this sample, prior suspension status was available for 5,533 students (15.7% or N=870 received at least one suspension in the prior school year) and special education status was available for 5,821 students (6.0% or N=349 had one or more disabilities in the intervention year).   (N=5,822). In this sample, 5600 students (ncontrol=3,222, ntreatment=2,378) had one Fallsemester math teacher randomly assigned to condition that semester; 222 students (ncontrol=130, ntreatment=92) had multiple Fall-semester math teachers randomly assigned to condition that semester (though 116 or 52% of these were assigned to the same condition). Contrasts are the difference in means or proportions. Significance is based on logistic regression coefficients (all measures were dichotomous). ***p ≤ .001, **p ≤ .01, *p ≤ .05.  We coded all district-provided race-ethnicities with which a teacher identified according to census definitions. We first coded teachers who identified as Hispanic for at least one ethnicity. We then coded teachers who identified as Black, not Hispanic; followed by White, not Hispanic or Black; followed by Asian, not White, Hispanic, or Black; followed by American Indian, not Black, Hispanic, White, or Asian. "Exclude" teachers taught only or primarily 6 th grade classes (N=25) or were non-assigned teachers who only taught students of other assigned teachers (N=2) (see Figure S1). Percentages reflect teachers with available race-ethnicity data for any of the four academic years provided by the district, prioritizing the race-ethnicity reported in the intervention year (2017-2018) if available. The sample size below each group represents all teachers in that group, whether or not (coded unknown) they had available data.  1.89 Note. The table reports the condition means for teachers randomly assigned to the treatment or control condition in Fall 2017 for demographic variables (available from district teacher records), suspension-related variables (computed from student discipline records), and grade levels and kinds of math courses taught (computed from district course records). The average suspension rate represents the average probability of suspensions received by the teacher's students (a student who had that teacher at any point during the academic year) at any time during the pre-intervention academic year (the suspension could be issued by any staff member). Linear regression (logistic regression) was used for continuous (dichotomous, indicated by superscript D) indicators. Logistic regression coefficients are represented as odds ratios, the ratio of the odds of having a value of 1 versus 0 on the indicator in the control (none) versus the treatment condition. Demographic indicators and prior suspension rates could not be computed for teachers for whom these records were unavailable. The sample size below each group represents all teachers in that group, whether or not they had available data for a given covariate. ***p≤.001, **p≤.01, *p≤.05 4.25*** Note. This table compares assigned (teachers who were randomized to condition in Fall 2017) versus non-assigned teachers (teachers who elected not to participate in the study or who were randomized to condition in Spring 2018). Survey variables (childhood hometown, childhood SES, and psychometrics) were unavailable for the vast majority of non-assigned teachers. Specifications are otherwise the same as Table S7. ***p≤.001, **p≤.01, *p≤.05

Main Text Analyses Tables (pre-registered)
All models predict the probability of receiving at least one suspension day in the intervention year unless otherwise indicated. All models controlled for the following covariates (not listed in tables for simplicity): mean-centered student race-ethnicity, meancentered student gender, whether or not the student received a suspension in the prior year (student-level), average suspension rate for students taking same math course in the prior year district-wide (course-level), average suspension rate among students who had the same math teacher the prior year (teacher-level), and missing value indicators for prior suspensions data at the student-, course-, and teacher-level. All effects tables present the intercept (the treatment-condition mean for a student in the relevant reference category) and the condition difference (control or none vs treatment) from this mean from the corresponding model. When student raceethnicity, gender, or prior suspension status is the focal moderator, it is coded as 0/1 rather than mean-centered.          Tables   Table S19. Key Regression Coefficients from Model testing the interaction between Control v. Treatment and whether student gender is Female or not in the ITT Sample (non-preregistered) Coef        teachers. The model was fit for students with district school records in the intervention year and subsequent academic year, classified according to their 7 th grade math teacher(s)' condition assignment during the intervention year. Nearly all such students were 7 th graders in the intervention year and 8 th graders in the subsequent year. The model excluded students with suspension records for both years who were not exposed to an assigned math teacher in Fall 2017.    Note. N=2,000 students, with 48 different intervention-year teachers. Simple effects were computed using the models shown in Table S33. Students who had an assigned teacher in 2017-2018 were excluded from this analysis.

Additional Gender Analyses
Additional Analyses Tables: Subsequent year missing data (not pre-registered)

Figure S1. Teacher Inclusion Criteria and Condition Assignments
Math teachers who appeared in AY 1718 course records provided by the district (N=200) Math teachers who taught only 6 th grade classes in AY 1718 (N=25)

Exclude
Math teachers who taught at least one 7 th or 8 th grade class in AY 1718 (N=175)

Coded as "No Assignment"
Grade 7/8 math teachers who appeared in the Qualtrics dataset and started S1 survey (N=109) Grade 7/8 math teachers who did not start the S1 survey (N=66) Grade 7/8 math teachers who started the S1 survey in Fall 2017 (N=90) Grade 7/8 math teachers who started the S1 survey AFTER Fall 2017 (N=19) Grade 7/8 math teachers who started the S1 survey in Fall 2017 and received a condition assignment (N=66) Grade 7/8 math teachers who started the S1 survey in Fall 2017 but did not get to randomized content (N=24) Control (N=36); Treatment (N=30) Control (N=8); Treatment (N=7); Missing Condition (N=4)